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Human gait recognition is a biometric technique that has been utilized for 
security purposes for the last decade. Gait recognition is an appealing 
biometric modality that aims to identify individuals based on the way they 
walk. The outbreak of the novel coronavirus (COVID-19), has spread across 
the world. The number of people infected with COVID-19 is rising rapidly 
throughout the world. Even though some vaccines for this pandemic have 
been developed to minimize the effects of COVID-19, deep learning-based 
gait recognition techniques have shown themselves to be an effective tool for 
identifying the individuals wearing face mask in COVID-19 pandemic. These 
techniques play an important part in reducing the rate of COVID-19 spreading 
throughout the world in the context of the COVID-19 pandemic. Deep 
learning methods are currently dominating the state-of-the-art in gait 
recognition and have fostered real-world applications. The main objective of 
this paper is to provide a comprehensive overview of recent advancements in 
gait recognition with deep learning, including datasets, test protocols, state- 
of-the-art solutions, challenges, and future research directions. The purpose of 


this discussion is to identify current challenges that need to be addressed as 
well as to suggest some directions for future research that could be explored. 
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1. INTRODUCTION 

In recent years have shown the rise of a new coronavirus known as COVID-19, which was caused by 
the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). This virus has caused a pandemic around 
the world [1]. Both number of people who have been infected as well as the mortality rate is rising rapidly. 
Since the time when this article is being written, it has been reported that more than 108,000,000 persons have 
been infected with COVID-19, the number of death cases is around 2,400,000, and the number of patients who 
have recovered is approximately 80,000,000 [2]. The widespread transmission of COVID-19 has resulted in 
the quarantining of a significant section of the world's population and the destruction of a broad variety of 
industrial sectors, both of which have contributed to the current state of global economic instability. Fever, dry 
cough, myalgia, dyspnea, and headache are some of the most usual symptoms of the new coronavirus [3]. 

In recent decades, a subfield of biometric identification known as gait recognition has emerged. Using 
the intrinsic patterns of movement in individuals to detect them, this subfield focuses on detecting them based on 
their physical characteristics, such as their size and relationships with the trunk and the limbs, as well as space- 
time information about their intrinsic movement patterns [4]. Gait recognition is a subfield of biometric 
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identification. Such a strategy would be of great use in situations involving surveillance systems or foggy settings. 
For example, biometric monitoring, which often involves the use of distinguishing characteristics for biometric 
Identification methods such as fingerprints and facial features are either difficult or impossible to differentiate. 

In addition, techniques based on gait recognition offer certain advantages over other biometric 
identification models since it is difficult to hack such designs. The challenge stems mostly from the features 
that are inherent to the idea itself, namely the identification of the notion based on the silhouette and its 
movement, both of which are notoriously difficult to reproduce. This does not really hold true for other 
techniques, such as when a person may manipulate the system by, for example, hiding their face. In addition, 
it is also possible to detect gait patterns for models that do not require high resolution photos or specialized 
devices in order to be correctly identified, such as fingerprints and iris scans, which do not need high-resolution 
photos. In addition, techniques of gait recognition do not require the individual being analyzed to cooperate 
with the identification system [5]. This contrasts with other approaches which require the person to be identified 
to work together with the system. 

Similarly, gait information can easily be obtained from a distance, which is a huge advantage over 
other techniques when it comes to collecting gait data, particularly in situations where the identification of the 
person being analyzed does not require the assistance of the person being analyzed, such as in criminal 
investigations [6]. Also, these methods can be used to get the data without having to use expensive, specialized 
hardware. This can often lower the overall cost. The process of getting them out has become easier as 
surveillance technologies and smartphones with accelerometers have become more common. This makes it 
easy to get data signals out of these devices. 

Identification of people based on how they walk, and move is not at all an easy process, even though 
the instruments discussed above are quite straightforward. The intricacy of the job causes the standard gait 
recognition approaches, which entail data pre-processing and manually collected characteristics for subsequent 
recognition, to have a lot of limits and issues [7]. For instance, they need to be able to perceive the whole gait 
as well as huge intra-class variances, occlusions, and shadows, and they need to be able to find the body 
segments. These are just some of the requirements. In recent years, a trend in machine learning known as deep 
learning has emerged as a revolutionary tool for solving problems related to image and sound processing, 
computer vision, and speech processing. This tool has been shown to perform significantly better than virtually 
any baseline that had been established up until that point. The new prevailing model eliminates the need for 
manual extraction of representative characteristics from seasoned professionals. Aside from that, it gives the 
best results for gait detection, getting rid of the problems that were there before and opening the door for more 
research. 

Several studies have been published in recent years summarizing the main achievements in biometrics 
and gait recognition. Biometric identification has made significant progress over the past few decades in 
general, as demonstrated by [8]-[10]. On the other hand, discussed by [11] the novelties regarding brain 
biometrics. In both [12], [13] provided a comprehensive survey that covered general topics in regards to gait 
recognition. Meanwhile, a study published by [14] examined the recognition of gaits based on vision, while 
proposed a similar concept for smart visual surveillance [15]. Furthermore, a summary of the work with regard 
to the gait-based re-identification of individuals was published by [16] in 2019. In contrast, it has been 
investigated by [17] how wearable sensors could be used to detect gait movements. Despite this, none of the 
approaches to gait recognition developed over the past few years have focused on the main advances regarding 
deep learning-based approaches that have made an impact on this field in recent years. 

As more research is being conducted on biometrics and gait recognition in recent years, a growing 
number of authors have contributed papers that give a summary of the most significant advancements made in 
each subject [18]. According to the survey, the majority of the works used convolutional neural networks 
(CNNs) [19] as well as long short-term memory (LSTM) [20], [21] some novel and relevant architectures are 
also being left behind, such as autoencoders [22], capsule networks (CapsNet) [23], deep belief networks [24], 
and generative adversarial networks (GANs) [25]. As a result, the following are the primary goals of this study 
which shows the most current and major publications with the most effective tactics and methods based on 
deep learning for gait recognition aim. 

To give the reader with a thorough and illustrated theory on this topic investigation of its biometric 
recognition antecedents and disclosure of the most prevalent gait feature extraction methods and architectures 
utilized to address the linked problems include boundaries, as well as datasets that may be utilized for gait 
recognition, including images, classifications, and descriptions as well. Among the publications that were 
considered for examination, we focused our attention on gait recognition employing deep learning approaches, 
such as convolutional neural networks, LSTM, and so on. Deep learning techniques were substituted by the 
structures themselves. Filter number three a result of the work's invention, comparable methodologies and 
designs were published in publications that weren't discarded. The research that was used in this study was 
published no more than five years ago [26]. 
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2. METHODS 

This section introduces the concept of the theoretical framework of the deep learning algorithms used 
for gait identification, as well as the gait recognition issue. As a part of this section, we will provide an overview 
of some of the most popular deep-learning architectures that have been used in the past to recognize gaits, as 
well as some options that may be available in the future. Furthermore, several approaches to recognizing 
individuals by their biological characteristics have been proposed and discussed as well. 


2.1. The use of deep learning approaches considered in the recognition of gaits 

Over the past few years, traditional machine learning strategies and gait recognition algorithms have 
produced relatively satisfactory results in terms of recognizing patterns in walking, it is generally true that these 
methods are limited to features that are handcrafted and do not have the ability to discover intrinsic patterns 
within data. This is the case even though such strategies have provided relatively satisfactory results in the 
past. In this regard, deep learning-based algorithms have emerged as an elegant option for solving image/video 
and sequence difficulties, among other challenges; in addition, they have shown themselves as a potent tool for 
gait detection. Hence, the purpose of this section is to provide an overview of some of the most popular deep 
learning architectures that have been used in the past to recognise gaits. These include recurrent neural 
networks (RNN), CNN, autoencoders, capsule networks, generative adversarial networks and deep belief 
networks (DBN) [27]. 


2.1.1. Convolutional neural networks 

Since the beginning of the 2010s, CNNs have gained an extraordinary degree of popularity [28]. They 
have been indispensable in the process of resolving issues that are associated with image processing, such as 
image classification and segmentation. The fundamental building blocks of a CNN are called convolutional 
neurons, and they typically consist of either 3 X 3 or 5 X 5 kernels. These kernels are tasked with the responsibility 
of executing convolutional operations on the input data. In a nutshell, the conclusion of this procedure results in 
the production of a new group of matrices, which are then included in the model as successive layers. In the field 
of signal processing, the term "convolution" refers to the act of multiplying two signals in order to produce a third 
signal. The concept is shown in Figure 1. 

Therefore, a CNN may be seen as a stack of convolutional kernels that generates the input for 
consecutive layers. It is important to take note that the procedure may include the inclusion of certain 
intermediary pooling layers, and that a fully linked layer is coupled at the top of the architecture for the sake 
of categorization. The learning process is carried out via an algorithm called backpropagation, which works in 
conjunction with a gradient descent computation. The architecture of a CNN is shown in Figure 2, which was 
influenced by LeNet-5. 


Figure 1. An example of 3X3 convolution 


2.1.2. Capsule networks 

CNNs are particularly effective at interpreting visual characteristics; yet, they tend to become 
confused by the spatial connections between many oddities due to their complexity. In other words, a CNN 
that has been trained is typically, the system is capable of recognizing dogs if it is able to find a dog's body, 
face, and tail, even if the dog is arranged in a different sequence or if its parts are located in different areas of 
an image. This is true even if the parts of the dog are located in different parts of the image. However, in order 
to solve this issue, the authors of capsule networks propose using a hierarchical strategy [29]. In a nutshell, the 
model has a structure that consists of two layers. The first component is referred to as a convolutional encoder, 
and it is responsible for the identification of low-level characteristics. There is also the second kind of linear 
decoder which is fully connected and employs an algorithm called routing by agreement to address such low- 


Indonesian J Elec Eng & Comp Sci, Vol. 30, No. 2, May 2023: 882-902 


Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752 im) 885 


level information to the appropriate spot in a hierarchy with a higher level of information [30]. This type of 
decoder is also known as a fully connected linear encoder. Cabinets are thus more resistant to the effects of 
object orientation. In addition, they may do better when it comes to distinguishing several things or objects that 
overlap in a science. 


Maxpool Maxpool Maxpool Maxpool FC FC Output 


Figure 2. Example of a standard CNN architecture 


2.1.3. Recurrent neural networks 

The subject of gait identification has been approached in a number of different studies as a series of 
pictures characterizing an individual's movement. RNNs are a typical way that may be used to recognize such 
a strategy when employing deep learning. These networks calculate each neuron's activation by taking into 
consideration the information from the input data as well as the output of other neurons in a recurrent manner. 
As an option, it may be possible to integrate this architecture with a CNN in order to get additional information 
about the input pictures as a whole so that the inference can be carried out accordingly. Owing to the fact that 
defining a person's gait often involves a significant quantity of sequential information, a certain group of RNNs 
known as gated RNN are better ideal for the job due to their capacity to cope with extended sequences. When 
discussing this topic, it is possible to make reference to two primary designs, namely the gated recurrent unit 
(GRU) and the long-short term memory [31]. 


2.1.4. Long-short term memory 

The lengthy-short term memory was initially built in with the primary goal of improving outcomes on 
long datasets. This was done with the intention of saving time. In a nutshell, LSTMs function in a manner that 
is similar to that of the conventional RNN; specifically, the output of a particular neuron is determined by the 
recurrent information gleaned from the results of neurons that came before it. The primary distinction may be 
seen in the architecture of the LSTM cell, which is characterized by more intricate and intricate interactions. 

A structure like this one is made up of three primary gates that are responsible for controlling the flow of 

information. These gates are as follows: 

— The forget gate: In order to decide how much information should be stored, this gate is called "the 
forgetting gate," since it reduces the amount of information that is stored. The data from the prior state 
and the present state are fed into a sigmoid function, which produces values between 0 and 1 depending 
on the input. The closer you are to 1, the more knowledge you will retain. A new value is computed by 
the input gate in order to bring the present concealed state up to date. The input gate takes into account 
two primary values: i) the relevance of the previously concealed state is determined by a sigmoid function; 
and ii) the original value is sent to a hyperbolic tangent (tanh) function, which is responsible for squishing 
this value somewhere between | and 1. The current concealed state is defined by the product of these two 
variables multiplied together. 

— The output gate: After determining the value of the forget and the input gates, the output gate is what 
determines the value of the cell's output. The procedure is carried out in the following manner: 1) the 
values from the forget gate and the input gate are added together and then sent to a function; 11) the prior 
state of the cell is sent to a sigmoid function in order to determine its current state; and iii) Sigmoid 
function's output will be multiplied by the output from the subsequent function in order to produce the 
output of the cell. 

— Gated recurrent unity: As a recurrent neural network, the gated recurrent unit was originally envisioned 
as an optimal means of improving the results of neural machine translations by enhancing outcomes from 
neural machine learning. A GRU is very similar to an LSTM in that it has internal gates that regulate data 
flow through it. The primary distinction between the two types of models is the number of gates that are 
available in each; for example, a GRU has only two gates, which are known as the forget and the output 
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gates, where as an LSTM has three gates. Studies have shown that the GRU can achieve similar results 
to those achieved by the LSTM, despite the fact that the GRU uses fewer gates than the LSTM does. In 
this manner, the GRU offers the advantage of receiving a reduced computational burden and a faster 
performance when both training and inference tasks are considered. 


2.1.5. Autoencoders 
Autoencoders are a kind of generated neural network that are often used for the purposes of data 
reduction and picture denoising. The model may be broken down into two primary stages, which are as follows: 
— Encoder: This component is in charge of encoding the information that is provided as input into a typically 
more condensed feature space. 
— Decoder: This component is responsible for the reconstruction of encoded data in an unsupervised 
manner. Figure 3 is a depiction of the model's architectural layout. 


Input data 
Latent Space 


— Sy = _ 


—_—_—_—_—_——— "> 
Encoder Bottleneck Decoder 


Figure 3. Example of a standard autoencoder architecture 


2.1.6. Deep belief networks 

Stochastic neural networks suited for generative tasks are called deep belief networks. These networks 
are structured in such a way that each layer represents a greedy-fashioned trained restricted boltzmann machine 
(RBM). In a nutshell, RBM is a graphical model that is made up of a visible set of units and a latent set of 
units, which are referred to as the visible layer and the hidden layer, respectively. These units are connected to 
one another by a weight matrix, and there is no connection between neurons that are located on the same layer. 
Attaching a set of input data to the model's visible layer and searching for a representation of that data in the 
model's hidden units are the two steps that make up the learning technique. 

In addition, the training phase is carried out by reducing the amount of energy used by the system. 
This is often accomplished by using a Markov chain technique in conjunction with Gibbs sampling for the sake 
of optimization. In order to perform classification tasks efficiently, the most common approach is to couple a 
SoftMax layer on top of a DBN architecture and, after greedily pre-training all of the RBMs, fine-tune the 
weights using backpropagation, for example, it might be necessary to adjust the weight matrices and to fit the 
labels correctly for proper identification. This is considered to be the most effective method. 


2.1.7. Generative adversarial networks 

In recent years, generative adversarial networks have gained a lot of popularity thanks to their 
remarkable capacity to produce realistic synthetic pictures. Two separate networks are used in this model. One 
network is a generator, which learns how the distribution of the data is distributed and generates synthetic 
samples to be analyzed, and the second network is a discriminator, which works to determine whether a 
particular instance was created naturally or artificially. The generator is responsible for learning the distribution 
of the data and producing synthetic samples. The generator and the discriminator engage in an antagonistic 
relationship in which they compete with one another, with the generator attempting to create samples that are 
convincing enough to mislead the discriminator. On the other hand, the discriminator becomes better and better 
at identifying these phone photos. The illustration of the model may be seen in Figure 4. 


2.1.8. Summary of the deep learning techniques 

There is asummary in Table 1 which provides an overview of the deep learning techniques considered 
in this paper. The Table 1 also provides how the deep learning techniques are frequently used in the context of 
recognizing gait patterns. The same work may at some point be used by a variety of architectures, resulting in 
its appearance in the table more than once. 


Indonesian J Elec Eng & Comp Sci, Vol. 30, No. 2, May 2023: 882-902 


Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752 ) 887 


Real Image 


a 


Generated/ 


Fake Images 


Inputs 


Figure 4. Example a standard GAN architecture 


Table 1. Gait recognition approaches based on deep learning, organized according to the type of neural 


network 
“Technique Task 
Convolutional neural Networks (CNN) of a video. 
Auto encoder Works by compressing and decompressing features from the input. 
Capsule Improve the semantic organization of the outputs from a CNN. 
Deep belief networks Encode features and patterns into compressed representations. 
Generative adversarial networks A training method that relies on the differentiation of an original input and a generated 
counterpart from a model, such as a CNN. 
Recurrent neural networks Comprise both GRUs and LSTMs, which are composed of several gates to control the flow of 


information and are employed to deal with temporal information.” 


2.2. Gait recognition for human 

A number of approaches of recognizing persons based on their biological traits have been discussed 
up to this point. There is no doubt that these methods are reliable and safe, as evidenced by their success in 
governance systems of the banking sector and the public sector, there are two primary drawbacks that need to 
be emphasized: i) the provision of personal biometric information by passive means is a requirement of these 
systems. In order for the recognition of the individual, the required information must be provided or registered 
by the individual; and ii) It is important to note that some of these systems require specialized equipment in 
order to operate. In spite of these limitations, the methods are considered to be reliable and safe, as evidenced 
by their success in these areas. 

Gait recognition models might be an alternate solution to address such problems, particularly when 
taking video-based techniques into consideration. It is also important to note that, in most cases. The process 
of acquiring the biometric information relies on only a camera without any special features or any non-evasive 
sensors and due to the passive nature of the process of collecting such information, a great deal of power and 
privacy are compromised, these methods do not experience the issues that have been outlined above. 

As a result, the individual who was observed is no longer needed to help with the identifying process 
or offer any information at this point. Nevertheless, video-based recognition is not the only option; despite the 
fact that it is the approach that makes the most sense, it is not the only one. Identifying gait may also be 
accomplished by the use of different features, such as the footprint, which takes into account the pressure and 
size of the region. The expense involved in implementing the technology that is essential to get such 
functionality is undeniably one of the models’ most notable drawbacks [32]. 

In addition, the approaches that have been developed for gait identification may be separated into two 
primary categories: those that are template-based and those that are not. The aim of template-based methods is to 
determine whether the trunk is moving or whether the legs are moving; more specifically, these methods are based 
on the dynamics of the movement through space, or they may be based on a spatio-temporal perspective [33]. 
There are a number of techniques that can be employed to extract information from walking paths, gait 
information images, and gait energy images, among which canonical correlation analysis (CCA) and joint 
sparsity models (JSM) can be utilized. On the other hand, approaches that are not dependent on templates 
consider the shape and its qualities as the most important traits; this means that the identification of a person is 
accomplished via the use of measures that represent the shape. 

The following pieces of equipment may be found in the scene: sidewall cameras, then a multi- 
spectrum photographic sensor. During the stroll, the connection between space and time is measured using 
sensors that monitor velocity and gyroscopes that are housed on the roof. The gait cycle is finished off with a 
sensor that looks like a carpet and is embedded into the floor. This sensor is used to measure the pressure that 
the foot exerts on the ground. Figure 5 depicts the many types of sensors and devices that may be used in the 
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gait information collecting process. These are shown in the preceding sentence. The image depicts an 
environment similar to a laboratory that has been designed with several embedded devices for the purpose of 
gait data acquisition. 


Figure 5. An example of gait data collecting setting 


Among these devices, there are cameras, the most commonly used device for this purpose because it 
is quite accessible and allows for the acquisition of data over a greater distance, as well as multispectral 
photographic sensors, which are also very effective, but due to their high cost, they are less widely accessible 
than other types of cameras. On the roof, the picture also consists of gyroscopes and velocity sensors, both of 
which are responsible for characterizing the acquisition of spatio-temporal context. In the last phase, a step- 
pressure carpet-like sensor is used to measure the walking rhythm [34]. 

This sensor is very useful for detecting gait irregularities, such as those caused by Parkinson's disease 
and other neurodegenerative conditions. Deep learning-based techniques have brought about a paradigm change 
in the area of gait detection, which has led to the achievement of significant outcomes in a number of applications. 
Therefore, the next part provides an introduction to an in-depth discussion of numerous studies that were reviewed 
within the framework of systems for gait identification that are based on deep learning [35]. 


2.3. Discussion 

An overview of the studies that were surveyed and provided in this part can be found in Table 2. The 
Table 2 provides details of the gait recognition using deep leaning approaches based on the research articles 
published year, model used, input type, dataset used, obtained resuts and measure interms of identificate 
rate/accuracy/prediction error/ root-mean-square error (RMSE). In this section, we are going to explore in more 
depth how deep learning approaches can be used to recognise gait patterns. 

Convolutional neural networks are, in general, the most common option when contemplating deep 
learning solutions, particularly with image- and video-based difficulties, such as gait recognition. This 
tendency is to be anticipated given that CNNs have achieved exceptional results in a variety of applications 
and have won the majority of benchmark competitions over the last several years. Despite this, the other designs 
each make a significant contribution to the area by performing more effectively than the others in a variety of 
particular jobs. For example, capsule networks have the ability to extract partial gait representations in a 
hierarchical form. This enables them to provide superior outcomes in scenarios in which persons or items are 
displayed in the scene with different orientations or overlapped. In a similar vein, recurrent neural networks 
are indispensable when it comes to dealing with sequential data such as movies, and as a result, they offer 
themselves as an indispensable instrument for the gait identification technique described here. 

Other data sources, such as accelerometers, gyroscopes, sensor-based features in general, as well as 
handcrafted features, have been shown in many studies to be able to produce reliable results, as well, despite the 
fact that the vast majority of deep learning-based approaches to gait recognition use images or videos as their 
source data. The majority of these research focus on unsupervised methods of deep learning throughout the 
process. Examples of such methods are autoencoders and DBNs, both of which tend to be more expressive across 
various data sets. On the other hand, CNNs are essential when working with raw picture and video files. These 
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unsupervised approaches are able to extract information on the data distribution and change it, often in a lower- 
dimensional space. As a result, they are able to provide gait recognition with more representative characteristics. 


Table 2. Approaches to the recognition of gait based on deep learning 


“Ref. Year Model Input Type Dataset Result Measure 

36 2016 CNN GEI OU-ISIR 94.6% Identification rate 

37 2017 CNN Cross-view CASIA B 90.8% Accuracy 

ae 2017 ne pout Sensors OU-ISR 97.6% Identification rate 

Bayesian 
39 2017 CNN Optical flow TUM-GAID and CASIA B 97.52% Accuracy 
si 2017 CNN Siamese Cross-view OU-ISIR 98.8% Accuracy 
networks 

os 2017 SORE Optical Flow ie Arc Brand 99.8% Identification rate 

42 2017 + Autoencoders + PCA GEI CASIA B and SZU RGB 97.58% Identification rate 

31 5018 CNN + LSTM open whuGAIT and OU-ISIR 99.75% Accuracy 
and Gyroscope 

441 5018 GAN PEI me pee ar ta Band 94.7% Accuracy 

45 2018 DBN Sensors Data collected by the authors 93% Accuracy 

46 2019 Capsule LBC and MMF OU-ISIR 74.4% Accuracy 

47 2019 CNN GEI + Data CASIA B 98% Accuracy 
Augmentation 

48 2019 Autoencoders + Cross- and CASIA B, USF, and FVG 99.1% Accuracy 

LSTN Frontal-view 

49 2019 Autoencoders + GEI OU-ISIR and CASIA B 96.15% Accuracy 

50 2019 LSTM Sensors Data collected by the authors 0.02 Prediction error 

51 2019 GAN GEI CASIA A and CASIA B 82% Recognition result 

52 2020 LSTM GEI CASIA B and OU-ISIR 99.1% Recognition rate 

53 2020 CNN Cross-view OU-MVLP, OU-LP, and 98.93% Identification rate 

CASIA B 

54 2020 GAN Cross-view OU-MVLP and CASIA B 93.2% Identification rate 

55 2020 Capsule Multi-scale CASIA-B and OU-MVLP 84.5% Identification rate 
representations 

56 2020 DBN Sensors Data collected by the authors 2.61 RMSE 

57 2021 LSTM IMU whuGAIT and OU-ISIR 94.15% Accuracy 

58 2021 Capsule Sensors Several 99.69% Accuracy” 


It is worth noting that generative adversarial networks are used to describe a specific scenario where 
gait systems can gain knowledge of a variety of features, such as orientations, clothing, the number of people 
present, and so on, since they can create synthetic data to train the models. This allows gait systems to learn a 
wider range of features than was previously possible. In addition to this, they are helpful for analysing frauds 
in gait-based systems for the purpose of ensuring the system's security by creating phoney photographs for the 
purpose of testing. Gait energy image is one of the most common ways that are used to describe gait image data. 
This approach represents the sequence of a basic energy image cycle by utilising the weighted average method. 

In addition, the sequences that make up a journey cycle are subjected to processing so that the binary 
silhouette may be. As a result, GEI preserves both the static and dynamic properties of human walking while 
greatly reducing the amount of computer effort required for picture processing. After doing a thorough 
investigation of the approach, one is able to identify a handful of features that are distinctive of the model 
which are as follows: 

— A unique representation of the human walk is a key part of the presentation, which does not soften the 
context of vector images in any way. 

— The motion of a human is captured in a single image while keeping temporal information at the same time. 

— The GEl's are quite sensitive to silhouette noise when it comes to an individual photograph. In a similar 
vein, cross-view-based gait detection is a common method that is used to cope with many visual 
viewpoints. Due to the fact that the input type requires a number of completely controlled cameras as well 
as cooperating surroundings, it can only be used to real-world circumstances. In addition to this, before 
completing any combination, it visually normalizes the gait characteristics, which enables the model to 
understand the correlations between the various visual motions in the scene. An example of this method 
can be seen in the cross-view gait recognition system that was developed at Newcastle University in the 
United Kingdom. This system, known as DiGGAN, achieved state-of-the-art results on the largest 
Multiview gait dataset in the world (which included more than 10,000 individuals), and it did so by 
employing cross-view approaches [59]. 
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In spite of the success that such approaches have achieved, there are still certain areas in which they are 
lacking, and it is these areas that other, less common ways try to address. For example, the Period Energy Image 
includes a multichannel gait template that was suggested to address the issue of view angle fluctuation caused by 
cross-view and GE] loss of temporal information [60]. Optical flow is still another strategy that has been shown 
in various studies. By means of this method, it can be determined exactly what items are seen within the scene, 
how they are arranged spatially, and what changes have taken place over the course of time [61]. In conclusion, 
for deep learning-based gait identification methods, a broad variety of hand-crafted features or sensor-based 
input data types are also used. Inertial measurement units, accelerometers, gyroscopes, sensor output, and other 
similar techniques are examples of the types of methods that fall under this category. 

Regarding the current state of the art in gait recognition, there are primarily two different techniques 
that one may take: one is literature-based, and the other is representation-based. The literature suggests that 
two-dimensional convolutional neural networks (2D-CNNs) are the most popular form of deep neural network 
(DNN) for gait recognition by means of deep learning. In fact, approximately 50% of all solutions are just 
based on two-dimensional convolutional neural network architectures for classification. Following the most 
popular categories are those that include solutions using 3D-CNN and GAN, with each category accounting 
for 10% of the total published material. In addition, deep autoencoder (DAE), RNN, DBN, and graph 
convolutional networks get less considerations than other types of DNNs. These percentages equate to 5%, 
3%, 2%, and 1%, respectively. However, hybrid methods account for 26% of the solutions. Within this 
category, CNN and RNN combinations are the most popularly used approach, accounting for approximately 
15% of presence. A combination of DAE with GAN and RNN is responsible for 10% of the methods, followed 
by RNN-CapsNet methods that are responsible for 2% of solutions. When it comes to solutions that are state- 
of-the-art and are based on representation, silhouettes are the ones that are used the most often for gait 
identification; this accounts for more than 85% of all solutions. Skeletons, despite the fact that they represent 
just 10% of the potential options, have been studied less often in proportion to silhouettes. This is despite the 
fact that the approach is an encouraging one. There were also some approaches, which account for around 5% 
of the literature that is now accessible, that investigate the representations of the skeleton as well as the 
silhouette. It should be noted that these methods either use unraveled representation learning or score of fusion 
algorithms; depending on the case. 

In order to comprehend gait data, one has to acknowledge that it is complex, arising from the 
interaction of many factors, such as occlusion/obstruction, camera points of view, pose of individuals, the 
sequence of events, motion of body parts, or light sources present in the data, whilst others may not necessarily 
be present in the data [62], is one of the primary challenges associated with deep learning-based gait 
recognition. These kinds of elements might interact in a convoluted manner, which makes the work of gait 
detection more difficult. At the moment, numerous approaches are available in various fields connected to 
pattern recognition, for example, face recognition, mood estimation, and stance estimation, and there is an 
increasing number of approaches being tested. These experts devote their time and energy to gaining an 
understanding of complex settings and isolating representations that categorise the different components that 
contribute to an explanation inside a high-dimensional space. However, the vast majority of gait identification 
systems that are now available rely on deep learning, and these methods have not yet been examined. As a 
result, they are unable to directly separate the underlying structure of the gait data into major discrete variables. 
There is still opportunity for advancement in the use of ambiguous context techniques in various gait 
recognition systems, in spite of the recent success that has been made in this area [63]. 


3. DATASETS 

The training and assessment processes of machine learning models are dependent on a dataset that 
includes the topic of the task. This is the case regardless of the learning paradigm that is being used, which 
might be supervised, unsupervised, or any other paradigm. In addition, the use of such datasets enables the 
determination of how successful a technique is in the resolution of a particular issue, in addition to the 
comparison of that way to other potential solutions. 

When it comes to gait recognition, the number of unique datasets that are available to do the job is 
quite limited. This is true whether one considers public or private solutions. The creation of new models of 
artificial intelligence that are capable of identifying humans based on how they walk, or move has been 
hampered by a paucity of training data of this kind. Gait biometry requires a decent quantity of movement 
records for a subject, which implies recording and making many videos for each individual. This is a challenge 
since it is time-consuming to record and generate all of these movies for each person. In addition to this, movies 
of this kind often have an inherent high dimensionality, which results in the expansion of the dataset and 
necessitates a large amount of storage space. 
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It is necessary to get consent from each participant before extracting biometric data and making it 
publicly available. It's possible that litigation will result from the eventual establishment of a dataset that doesn't 
have the official agreement of each person. There are a number of datasets that are frequently used for gait 
recognition-related activities that are included in the following section. 


3.1. CMU MoBo dataset 

The CMU MoBo dataset has a very limited quantity of data, which consists of various films for gait 
recognition taken from a total of 20 different individuals. The segmentation method is simplified as a result of 
the inclusion of a silhouette mask set as well as bounding box information inside the dataset. It is possible to 
download the dataset without making reservations or having to sign any forms of agreement; all that is required 
is a connection to the file transfer protocol (FTP) server at the University of Calgary. This is one of the most 
significant benefits offered by the dataset [64]. 


3.2. TUM GAID dataset 

The TUM gait from audio, image, and depth (GAID) database is comprised of red, green and blue 
(RGB) pictures, audio, and depth data, which were initially captured from 305 individuals in three distinct 
variations. As the name implies, this database was built using GAID. In addition, 32 individuals were re- 
recorded in order to have some variety, bringing the total number of recordings to 3,370. The scientists claim 
that their dataset is the only one of its kind that enables recognition by merging video, audio, and depth 
information. The number of shows, the circumstances of carrying (with or without a bag), and the amount of 
time are all variables. The first record was created in the winter of 2012 (in the month of January), and it had 
176 recordings that were made while wearing a jacket and snow boots. The second performance was place in 
April 2012 with 161 individuals participating, and since it was warmer at the time, they wore noticeably 
different clothing. Out of this total, 32 persons were documented at both points in time, which allows for 
variance in both fabric and time. The presence of a clear-cut assessment procedure is a significant plus point 
for using this dataset [65]. 


3.3. HID-UMD dataset 

The human identification at a distance (HID)-UMD dataset is comprised of many movies of 
individuals walking that were collected from four distinct perspectives, as well as their associated binary masks 
for the foreground segmentation [66]. Its principal objective is to provide assistance to researchers in the 
development of novel gait and face biometrics. Figure 6 presents several picture samples seen from a variety 
of perspectives. The size of the whole dataset is roughly 2.2 gigabytes. 


Figure 6. Several picture samples seen from a variety of perspectives 


Furthermore, the dataset constitutes an aggregate consisting of two datasets described in the following 
way: Dataset 1 is comprised of walking sequences of 25 different persons in four separate poses such as i) 
frontal view of a person walking towards the camera; ii) frontal view of a person walking away; iii) frontal- 
parallel view of a person walking toward left; and iv) frontal-parallel view of a person walking toward right 
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[67]. The films included in Dataset 2 were taken by 55 different people as they walked down a T-shaped 
walkway. The sequences were captured using two cameras that were positioned in lines that were perpendicular 
to one another [68]. 


3.4. CASIA 
The CASIA gait database is a collection of four datasets intended specifically for gait recognition 
applications [69]. It is made available by the Institute of Automation of the Chinese Academy of Sciences 

(CASIA), and its details are as follows: 

— CASIA A: This dataset was developed in December 2001 and was formerly known as the NLPR Gait 
Database. It consists of 20 different people, and each of these people has 12 videos, which means that there 
are four videos for each of the three directions, which are parallel to the image plane, 45 degrees to the image 
plane, and 90 degrees to the image plane. In addition, the length of each visual sequence is unique to itself and 
varies depending on the individual's walking speed [70]. Figure 7 presents the example frames from CASIA A 
dataset viewed from each angle, with Figure 7(a) corresponding to the parallel view, Figure 7(b) corresponding 
to 90 degrees, and Figure 7(c) being an example of recognition algorithms viewed from 45 degrees. 


Fried b 
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Figure 7. Example frames from CASIA A, (a) corresponding to the parallel view, (b) corresponding 
to 90 degrees, and (c) being an example of recognition algorithms viewed from 45 degrees 


—  CASIA B: The CASIA B was developed in 2005 and involves shooting 124 distinct people from a total of 
11 different perspectives. Each scenario was performed three times, each time with a different alteration, 
such as a change in the dress or the pace at which the characters walked. In addition to that, the dataset 
contains a group of silhouettes that are supplied for each sequence. These silhouettes are used for foreground 
segmentation. Both Figure 8 and Figure 9 illustrate different angles and outfit choices [71]. 

— CASIA C: Acquired in 2005, the CASIA C dataset includes 153 subjects who were filmed by infrared 
cameras (thermal spectrum) while walking in one of four different variations: normal walking, slow 
walking, fast walking, or normal walking while carrying a backpack. Each of these variations was filmed 
by the infrared cameras [72]. Examples are shown in Figure 8. 


Figure 8. Example frames from CASIA B dataset 
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CASIA D: The CASIA gait-footprint dataset consists of photographs in addition to information on the 
cumulative foot pressure. The gait posture photographs total 3, 496, and there are 2, 658 cumulative images in 
the collection [73]. Figure 8 dipicts the example frames from CASIA B dataset. Each image corresponds to 
one of the 11 angles that were considered [74]. 

The movement may be noticed in people's choice of attire, as well as in personal goods like bags and 
other accessories. The designs for the data collection are shown in Figure 9. For each person under each 
viewing angle, 10 videos are provided, 2 videos with a long coat shown in Figure 9(a), 2 videos with a bag 
shown in Figure 9(b), and 6 videos in normal styles shown in Figure 9(c). The user is not need to pay anything 
in order to get the silhouettes for datasets A, B, or C. these downloads are available to them. 5 When it comes 
to the method of data collection, potential workers are required to fill out a form and then wait for the 
authorization of the dataset's owner. Figure 10 dipicts the example frames from CASIA C dataset. The images 
were obtained through an infrared camera at night and with variations in the manner of walking. Figure 10(a) 
shows walking without bag and Figure 10(b) walking with bag. 


(a) (b) (c) 


Figure 9. Example frames from CASIA B dataset, each image corresponds to one of the 11 comprised angles: 
(a) 2 videos with a long coat, (b) 2 videos with a bag, and (c) 6 videos in normal styles 


(a) (b) 


Figure 10. Example frames from CASIA C where: (a) shows walking without bag and (b) walking with bag 


3.5. The OU-ISIR biometric database and some of its applications 
Since 2007, researchers at Osaka University's Institute of Scientific and Industrial Research (ISIR) 
have been collecting the largest dataset for gait identification ever assembled anywhere in the world. A total 
of eight separate groups are working together on this project. They are illustrated below in alphabetical order. 
Treadmill dataset: This collection is made up of sequences of individuals walking on electronic 
treadmills while being filmed by a total of 25 cameras at a rate of 60 frames per second with a resolution of 

640 by 480. There are four distinct sections within it such as; 

a) Treadmill dataset A - Speed variation: This subset includes 34 people in a lateral view and has speeds 
that range from 2 to 10 km/h with 10 km/h intervals [75]. 

b) Treadmill dataset B - Clothes variation: This dataset consists of 68 persons in lateral vision, each wearing 
one of 32 different outfits. The method of acquiring the photographs in synchronous fashion with the data 
on the footsteps [76]. 

c) Treadmill dataset C - View: An expansive database that includes 168 individuals ranging in age from 4 
to 75 years old. In addition to this, the subset is made up of 25 views that were obtained using a multi- 
view synchronous gait system [77]. Figure 10 dipicts the example frames from CASIA C dataset. 

d) Treadmill dataset D - Gait fluctuation: This collection is made up of gait silhouette sequences with 185 
participants. They were examined from a lateral viewpoint, and the subject's speed changed during the 
sequences [78]. Figure 11 dipicts the example frames from CASIA D dataset. 
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Based on the high-speed and low-speed variations, the data were split into two groups, each consisting of 100 

participants (with an overlap of 15 individuals). 

Large population dataset: the large population dataset has been collected through various outreach 
activity events. A total of 4,016 subjects were filmed from a total of four different camera angles to produce the 
video. The video is filmed in a frame rate of 30 frames per second and has a resolution of 640 by 480 pixels [79]. 

Speed transition dataset: There are two subsets of the speed transition dataset [80], which can be found 
in the following order for your convenience: 

a) The scenes in Dataset A consist of persons walking at a constant speed of 4 kilometers per hour on a 
treadmill or on the ground. There are 179 of these scenarios. In this collection, the backdrop was covered 
up with wallpaper to eliminate any distracting elements. 

b) The sequences in Dataset B were collected from 25 individuals who walked at speeds ranging from | to 
5 kilometers per hour while using a treadmill. Everyone is videotaped not once, but twice. The 
acceleration and deceleration are each carried out over the course of three seconds, and the middle 
sequences, each lasting one second, were taken from both processes. 


Camera wo 


Footscan 


Figure 11. Example frames from CASIA D 


Multi-view large population dataset: This dataset is made up of 10,307 samples, with 5,114 
representing males and the remaining 5,193 representing women. The participants' ages span from 2 to 87 years 
old, and the dataset was designed for motion detection algorithms that use cross-vision. The photographs were 
captured using a camera that had a resolution of 1,280 by 980 pixels, a frame rate of 25 frames per second, and 
14 distinct camera angles. The apparatus that was used for the capture was positioned at a lateral distance of 8 
meters and at a height of 5 meters, respectively [81]. 

Large population dataset with bag: this dataset has been generated focused on the recognition of gait 
in relation to individuals carrying objects, with the aim of taking biometric information into account, and 
identifying the position of the part of the body that is being transported, along with the distance from the body. 
Specifically, the dataset examines people who are walking while carrying bags. The large population dataset 
with bag includes photographs taken by a camera placed at roughly 8 meters and at a height of 5 meters. The 
ages of the persons in this dataset range from 2 to 95 years old. The sequences were captured at a resolution of 
1,280 by 980 pixels at a frame rate of 25 frames per second. Each person was videotaped three times, and 
although the first person, in this case Al, may or may not be holding an item, the second and third individuals 
do not bring anything into the frame. In the event that anything has to be carried, a total of four regions the 
front, the rear, the lower side, and the upper side are designated as carrying areas. Each movie is also provided 
with its own binary mask, which may be used to remove the backdrop if desired [82]. 

Large population dataset with age: the objective of this study was to evaluate gait recognition in 
relation to people's age and gender. The dataset includes the motion data of 62,846 people moving along a 
predetermined route while being photographed by cameras with a resolution of 640 by 480 pixels and a frame 
rate of 30 per second. All of the films include binary masks that were acquired after the removal of the 
backdrop, and the ages of the persons in the sequences range from 2 to 90 years [83]. 

Inertial sensor dataset: inertial sensor dataset is one of the largest data sets containing information 
about gait based on inertial sensors. The device was designed to test and evaluate the methods of identifying 
an individual by movements through the use of motion sensors and accelerometers and was designed for 
research and evaluation purposes as well. The database is made up of images collected from 744 subjects with 
ages ranging from 2 to 78 years old (389 males and 355 females) [84]. 

Actions that are comparable inertial dataset: the parallel courses of events the 460 individuals that 
make up the inertial dataset range in age from 8 to 78, and the gender distribution is quite close to being even 
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across the board. Their walking characteristics were measured alongside the data reported. In addition to that, 
this dataset also includes information on the floor, which includes invalid, flat, stair ascending, stair climbing, 
ramp climbing, and ramp descending [85]. 


3.6. The dataset from the University of South Florida 

The dataset from the University of South Florida (USF) includes 1,870 sequences totaling 122 people. 
These subjects wore one of two distinct kinds of shoes. The dataset also takes into account people who are 
carrying a briefcase and those who are not carrying a briefcase, as well as different surface conditions like grass 
and concrete, and varied camera perspectives, such as left or right viewpoints. The films were shot in various 
locations outside at two distinct times to capture different moments in time [86]. 


3.7. Dataset from the Southampton 
The southampton human ID at a distance (SOTON) database is a project that was undertaken by the 

University of Southampton. This database is comprised of three primary components. These three primary 

components are illustrated below: 

- SOTON small database: consists of 12 participants that walked around an indoor track at varied speeds, 
wearing a variety of shoes and clothing, and either carrying or not carrying bags [87]. 

- SOTON large database: this database includes 114 participants who walked both indoors and outdoors, as 
well as on a treadmill and a track within the laboratory. Images were captured from six distinct camera 
vantage points and included in a compilation that had more than 5,000 individual segments [88]. 

- SOTON temporal: data for this study was collected using the multi-biometric tunnel, which is equipped 
with 12 synchronized cameras to record the stride of individuals throughout the course of the timeframe. 
The collection is made up of several dynamic scenes, each with its own backdrop, lighting, walking surface, 
and camera placement. There is a total of 25 people included in the dataset, with their ages spanning from 
20 to 55 years old. There are 17 male subjects and 8 female subjects. Take note that none of the scenes 
feature anybody wearing shoes [89]. 


3.8. AVA dataset with multiple views for use in gait recognition (AVAMVG) 

There is a database known as the AVA multiview dataset for gait recognition (AVAMVG) that was 
created specifically for algorithms that recognize gaits based on 3D measurements. This collection contains 
gait photographs from 20 actors displaying diverse trajectories, and it was created for use with gait recognition 
algorithms. After obtaining the sequences using cameras that had been precisely calibrated for the job, they 
were subjected to a post-processing stage in which 3D image reconstruction methods were used. In addition to 
that, a relevant binary silhouette that may be used for segmentation is shown for each sequence. In conclusion, 
the database stores 200 videos with six channels that may be seen simultaneously. These movies can also be 
used as 1,200 videos with a single view (that is, 6 times 200) [90]. Figure 12 illustrates various samples. 


Figure 12. Example of the multiview dataset, this image portrays of people walking in different directions, 
from multiple points of view 
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3.9. Kyushu University 4D gait database (KY4D) 

The Kyushu University 4D Gait Database (KY4D) is made up of successive 3D models and picture 
sequences of 42 persons walking over a total of six different trajectories, four of which are straight and two of 
which are curved [91]. The dataset A (Straight) consists of sequential 3D models and images of people walking 
along straight paths. Dataset B (Curved) is made up of image sequences of individuals walking along curved 
trajectories. Figure 13 dipicts an example of the studio generated from the KY4D Gait Database. Figure 13(a) 
is an illustration of the trajectory, which is shown as a red arrow in the figure. Figure 13(b) illustrates numerous 
3D reconstructed models. 

Figure 13(a) and Figure 13(b) dipicts the example of the studio generated from the KY4D Gait 
Database where Figure 13(a) provides a description of the trajectory indicated by the arrow and Figure 13(b) 
Shows sequential 3D models of someone walking straight. Database B (Curve): This database contains picture 
sequences of humans walking along curved paths; it was named after the shape of the trajectories (a). The 
radius r may be either 1.5 metres or 3.0 metres in length. This database is a representation of various 3D 
reconstructed models taking the curve route into consideration. Figure 14 dipicts anexample of the studio 
generated from the KY4D Gait Database B. Figure 14(a) and Figure 14(b) dipicts the example of the studio 
generated from the KY4D gait database. 


Figure 13. An example of the studio generated from the KY4D Gait Database where: (a) is an illustration of the 
trajectory, which is shown as a red arrow in the figure and (b) illustrates numerous 3D reconstructed models 


a) (b) 


Figure 14. An example of the studio generated from the (a) KY4D Gait Database B and (b) dipicts the 
example of the studio generated from the KY4D gait database 


Kentucky infrared (IR) shadow gait database: This database is made up of time-series shadow 
photographs of 54 different people [92]. Figure 15 dipicts the experimental setting and actual scene of the data 
generation where Figure 15(a) provides a description of the variation of radius and Figure 15(b) illustrates 
sequential 3D models showing the trajectory and path that a person follows as they walk along it. Everyone walks 
in a straight line, as shown by the arrow showing the direction of walking in Figure 15(a). A camera and two 
infrared lamps are shown in Figure 15(b), which illustrates the data collection process for the shadow database. 
The infrared lights were hung at an angle, and the camera was hung from the ceiling in a direction that was 
perpendicular to the floor. Figure 16 shows an example of the results that may be obtained from these captures. 
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Figure 15. Experimental setting actual scene where (a) provides a description of the variation of radius and 


(b) illustrates sequential 3D models showing the trajectory and path that a person follows as they walk along it 


Figure 16. An example movie from IR shadow images 


3.10. WhuGAIT datasets 


The findings presented in the research may be reproduced by using the whuGAIT Datasets which 


were made publicly accessible by Wuhan University in 2018 together with the source code and pre-trained 
models. In contrast to prior data collections, whuGAIT contains information obtained from 3D accelerometers 
and 3-axis gyroscopes acquired from 118 individuals, 20 of which were collected over the course of three days 
and 98 of which were taken in a single day [93]. According to the purpose of the analysis, the dataset has been 
segmented into the following six distinct subsets: 


4. 


Dataset #1: The first dataset is made up of 33,104 samples that were used for training and 3,740 that were 
used for testing. These samples were from 118 different people and were segmented in a two-step process. 
Dataset #2: A very similar dataset to that of Dataset #1, this dataset is also a two-stage segmentation dataset. 
49,275 samples are included in the training portion of the study, and 4,936 samples are included in the testing 
portion, and it was derived from the data gathered over the course of three days by 20 individuals. 

Dataset #3: three contains a subset that has been segmented into temporal size windows, with each sample 
taking up 2.56 seconds. There are 26,283 examples in this set that are used for training, and there are 
2,991 examples that are used for testing. 

Dataset #4: Similarly, to Dataset #3, this subset of data is divided into time frames of 2.56 seconds; however, 
it uses the data obtained from 20 different people over the course of three days. As part of this subset of 
records, 35,373 records are used in training purposes and 3,941 records are used in testing purposes. 
Dataset #5: For reasons of authentication, this particular subset is being used. It is made up of 74,142 
examples contributed by 118 people, with the information retrieved from 98 of those persons being used 
for training and the remaining 20 being used for validation. There are several factors that negatively affect 
the authentication process, including the fact that both samples were obtained from either one or two 
distinct individuals. The cases include gyroscopic data and two-step acceleration measurements. 

Dataset #6: Unlike Dataset #5, this subset is also based on the same structure that was used in Dataset #5, 
but rather than using horizontal alignment, it makes use of vertical alignment. 


DATASETS IN THEIR APPROPRIATE USE CONTEXTS 
In this part, the datasets that are most often used for gait recognition and that have been utilised in the 


studies that were considered for this study were provided. In light of this, throughout the course of the sequence, 


we will also provide a general review of several topics, including use settings, acquisition environments, and 
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spectrum. The relevant information for each dataset is shown in Table 3. The variables were broken down into 
12 primary categories, which included perspective, pace, objects, shoe, clothes, time, surface, silhouette, gait 
variation, walking on a treadmill, walking over ground, and foot pressure [94]. In addition, the construction 
habitats of the dataset. These environments are classified as either static inside or static outside, active indoors 
or active outdoors, or active both indoors and outdoors. 


Table 3. Datasets in usage contexts 

Context/Database 3.1 32 33 34 #35 36 3.7 3.8 3.9 3.10 
Viewpoint x 
Pace 
Objects 
Shoe 
Clothing 
Time 
Surface x xX x 
Silhouette x x 
Gait Fluctuation 
Treadmill walking 
Overground walking 
Foot pressure x 
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5. CONCLUSIONS AND FUTURE DIRECTIONS 

In this article, a comprehensive review of the methodologies used in crucial stages as well as current 
developments in gait detection and identification approaches for humans affected by the COVID-19 pandemic. There 
are three key concerns about gait recognition, including representations of gait and features dimensionality reduction 
and gait classification are both terms that are often used. In addition to highlighting the latest advances in deep 
learning-based gait recognition technology for human operation, this survey provides a comprehensive review of the 
most important works on gait identification that were published in the previous several years. In addition to this, it 
offered an in-depth overview of the nine datasets that were used most for the work, along with the extraction 
procedure for each dataset and the limits of each dataset. When it comes to these datasets, it is easy to see that 
attempts are being made toward a detection that is unaffected by gender, angle in the scene, clothing and shoes worn, 
the carrying of bags and other goods, and several other factors that are often encountered. As part of this paper, we 
examine the primary issues that have been addressed by the scientific community and the efforts that have been made 
in order to develop approaches that can help ensure access to monitored resources during the COVID-19 pandemic 
via gait identification for humans conducted by the scientific community. In addition to this, it investigates the 
benefits of a gait-based method in comparison to regularly used biometric characteristics. For example, gait systems 
do not need the users to consent to being recognized, while biometric features must. In this context, the aim of the 
review is to emphasize the positive features of gait biometrics as compared to other biometric approaches, and make 
sure to emphasize the most recent methodologies as well as state-of-the-art structures that have been developed over 
the years. It is important to keep in mind some of the most important aspects of video-based models and databases 
that deserve your attention in order to maximize their effectiveness. 

For the sake of training or validation, the datasets that are offered do not include more than one person 
in each video. Additionally, the surroundings in which the datasets were recorded are completely under the 
control of the researchers. In some of the sets, the angles are different, yet there are no changes to the items or 
even the colors in the backdrop. This is despite the fact that the angles are different. On some of the other films, 
the subjects are shown walking on electronic treadmills, which suggests an even higher level of control over 
both their motions and their surroundings. 

As a consequence of this, the models that were provided and mentioned in the study are probably 
going to be prone to showing deficiency and misbehavior when they are put to the operation of the system in 
the real world, in public places, for example, it is possible to identify persons of interest who are strolling along 
within the general public space. It is anticipated that the number of works that are related to the following 
topics will increase in the coming years due to the emerging tendencies: 

a) Transformer networks: This method targets data streams such as video and performs very well in the 
context of gait detection [95]. 

b) Recognizing a person's gender and age: It is one strategy that has been more popular over the last several 
years owing to the significance it has in a variety of applications [96]. 

c) Monitoring of hazardous environments: The use of gait features is ideal for the purpose of monitoring 
hazardous environments because these types of environments typically have limited illumination. In order 
to obtain better biometric identification characteristics, it is necessary to obtain more descriptive data [97]. 
In the context of hazardous environments, gait features are perfectly suited for the purpose of monitoring. 
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d) The presence of more than one person in the scene: the majority of the research on gait recognition has 
focused on identifying a single person in controlled environments when there are several people in the 
scene at the same time. 

However, situations that really occur in real life often demand solutions that can withstand 
uncontrolled conditions in which more than one person is present. In addition, we found that there was a 
growing need for gait identification in humans who were wearing various garments or were carrying things or 
wearing face mask in COVID-19 pandemic. In addition to hybrid techniques that were comprised of gait 
biometric traits, such as the face as well as the ear. 
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